The search functionality is under construction.

Keyword Search Result

[Keyword] reinforcement learning(72hit)

21-40hit(72hit)

  • Optimal Planning of Emergency Communication Network Using Deep Reinforcement Learning Open Access

    Changsheng YIN  Ruopeng YANG  Wei ZHU  Xiaofei ZOU  Junda ZHANG  

     
    PAPER-Network

      Pubricized:
    2020/06/29
      Vol:
    E104-B No:1
      Page(s):
    20-26

    Aiming at the problems of traditional algorithms that require high prior knowledge and weak timeliness, this paper proposes an emergency communication network topology planning method based on deep reinforcement learning. Based on the characteristics of the emergency communication network, and drawing on chess, we map the node layout and topology planning problems in the network planning to chess game problems; The two factors of network coverage and connectivity are considered to construct the evaluation criteria for network planning; The method of combining Monte Carlo tree search and self-game is used to realize network planning sample data generation, and the network planning strategy network and value network structure based on residual network are designed. On this basis, the model was constructed and trained based on Tensorflow library. Simulation results show that the proposed planning method can effectively implement intelligent planning of network topology, and has excellent timeliness and feasibility.

  • Towards Interpretable Reinforcement Learning with State Abstraction Driven by External Knowledge

    Nicolas BOUGIE  Ryutaro ICHISE  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/07/03
      Vol:
    E103-D No:10
      Page(s):
    2143-2153

    Advances in deep reinforcement learning have demonstrated its effectiveness in a wide variety of domains. Deep neural networks are capable of approximating value functions and policies in complex environments. However, deep neural networks inherit a number of drawbacks. Lack of interpretability limits their usability in many safety-critical real-world scenarios. Moreover, they rely on huge amounts of data to learn efficiently. This may be suitable in simulated tasks, but restricts their use to many real-world applications. Finally, their generalization capability is low, the ability to determine that a situation is similar to one encountered previously. We present a method to combine external knowledge and interpretable reinforcement learning. We derive a rule-based variant version of the Sarsa(λ) algorithm, which we call Sarsa-rb(λ), that augments data with prior knowledge and exploits similarities among states. We demonstrate that our approach leverages small amounts of prior knowledge to significantly accelerate the learning in multiple domains such as trading or visual navigation. The resulting agent provides substantial gains in training speed and performance over deep q-learning (DQN), deep deterministic policy gradients (DDPG), and improves stability over proximal policy optimization (PPO).

  • Hybrid of Reinforcement and Imitation Learning for Human-Like Agents

    Rousslan F. J. DOSSA  Xinyu LIAN  Hirokazu NOMOTO  Takashi MATSUBARA  Kuniaki UEHARA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/06/15
      Vol:
    E103-D No:9
      Page(s):
    1960-1970

    Reinforcement learning methods achieve performance superior to humans in a wide range of complex tasks and uncertain environments. However, high performance is not the sole metric for practical use such as in a game AI or autonomous driving. A highly efficient agent performs greedily and selfishly, and is thus inconvenient for surrounding users, hence a demand for human-like agents. Imitation learning reproduces the behavior of a human expert and builds a human-like agent. However, its performance is limited to the expert's. In this study, we propose a training scheme to construct a human-like and efficient agent via mixing reinforcement and imitation learning for discrete and continuous action space problems. The proposed hybrid agent achieves a higher performance than a strict imitation learning agent and exhibits more human-like behavior, which is measured via a human sensitivity test.

  • A Highly Reliable Compilation Optimization Passes Sequence Generation Framework

    Jiang WU  Jianjun XU  Xiankai MENG  Yan LEI  

     
    LETTER-Software System

      Pubricized:
    2020/06/22
      Vol:
    E103-D No:9
      Page(s):
    1998-2002

    We propose a new framework named ROICF based on reinforcement learning orienting reliable compilation optimization sequence generation. On the foundation of the LLVM standard compilation optimization passes, we can obtain specific effective phase ordering for different programs to improve program reliability.

  • Extendable NFV-Integrated Control Method Using Reinforcement Learning Open Access

    Akito SUZUKI  Ryoichi KAWAHARA  Masahiro KOBAYASHI  Shigeaki HARADA  Yousuke TAKAHASHI  Keisuke ISHIBASHI  

     
    PAPER-Network

      Pubricized:
    2020/01/24
      Vol:
    E103-B No:8
      Page(s):
    826-841

    Network functions virtualization (NFV) enables telecommunications service providers to realize various network services by flexibly combining multiple virtual network functions (VNFs). To provide such services, an NFV control method should optimally allocate such VNFs into physical networks and servers by taking account of the combination(s) of objective functions and constraints for each metric defined for each VNF type, e.g., VNF placements and routes between the VNFs. The NFV control method should also be extendable for adding new metrics or changing the combination of metrics. One approach for NFV control to optimize allocations is to construct an algorithm that simultaneously solves the combined optimization problem. However, this approach is not extendable because the problem needs to be reformulated every time a new metric is added or a combination of metrics is changed. Another approach involves using an extendable network-control architecture that coordinates multiple control algorithms specified for individual metrics. However, to the best of our knowledge, no method has been developed that can optimize allocations through this kind of coordination. In this paper, we propose an extendable NFV-integrated control method by coordinating multiple control algorithms. We also propose an efficient coordination algorithm based on reinforcement learning. Finally, we evaluate the effectiveness of the proposed method through simulations.

  • Control of Discrete-Time Chaotic Systems with Policy-Based Deep Reinforcement Learning

    Junya IKEMOTO  Toshimitsu USHIO  

     
    PAPER-Nonlinear Problems

      Vol:
    E103-A No:7
      Page(s):
    885-892

    The OGY method is one of control methods for a chaotic system. In the method, we have to calculate a target periodic orbit embedded in its chaotic attractor. Thus, we cannot use this method in the case where a precise mathematical model of the chaotic system cannot be identified. In this case, the delayed feedback control proposed by Pyragas is useful. However, even in the delayed feedback control, we need the mathematical model to determine a feedback gain that stabilizes the periodic orbit. Thus, we propose a reinforcement learning algorithm to the design of a controller for the chaotic system. Recently, reinforcement learning algorithms with deep neural networks have been paid much attention to. Those algorithms make it possible to control complex systems. We propose a controller design method consisting of two steps, where we determine a region including a target periodic point first, and make the controller learn an optimal control policy for its stabilization. The controller efficiently explores its control policy only in the region.

  • Multi-Autonomous Robot Enhanced Ad-Hoc Network under Uncertain and Vulnerable Environment Open Access

    Ming FENG  Lijun QIAN  Hao XU  

     
    INVITED PAPER

      Pubricized:
    2019/04/26
      Vol:
    E102-B No:10
      Page(s):
    1925-1932

    This paper studies the problem of real-time routing in a multi-autonomous robot enhanced network at uncertain and vulnerable tactical edge. Recent network protocols, such as opportunistic mobile network routing protocols, engaged social network in communication network that can increase the interoperability by using social mobility and opportunistic carry and forward routing algorithms. However, in practical harsh environment such as a battlefield, the uncertainty of social mobility and complexity of vulnerable environment due to unpredictable physical and cyber-attacks from enemy, would seriously affect the effectiveness and practicality of these emerging network protocols. This paper presents a GT-SaRE-MANET (Game Theoretic Situation-aware Robot Enhanced Mobile Ad-hoc Network) routing protocol that adopt the online reinforcement learning technique to supervise the mobility of multi-robots as well as handle the uncertainty and potential physical and cyber attack at tactical edge. Firstly, a set of game theoretic mission oriented metrics has been introduced to describe the interrelation among network quality, multi-robot mobility as well as potential attacking activities. Then, a distributed multi-agent game theoretic reinforcement learning algorithm has been developed. It will not only optimize GT-SaRE-MANET routing protocol and the mobility of multi-robots online, but also effectively avoid the physical and/or cyber-attacks from enemy by using the game theoretic mission oriented metrics. The effectiveness of proposed design has been demonstrated through computer aided simulations and hardware experiments.

  • Deep-Reinforcement-Learning-Based Distributed Vehicle Position Controls for Coverage Expansion in mmWave V2X

    Akihito TAYA  Takayuki NISHIO  Masahiro MORIKURA  Koji YAMAMOTO  

     
    PAPER-Network Management/Operation

      Pubricized:
    2019/04/17
      Vol:
    E102-B No:10
      Page(s):
    2054-2065

    In millimeter wave (mmWave) vehicular communications, multi-hop relay disconnection by line-of-sight (LOS) blockage is a critical problem, particularly in the early diffusion phase of mmWave-available vehicles, where not all vehicles have mmWave communication devices. This paper proposes a distributed position control method to establish long relay paths through road side units (RSUs). This is realized by a scheme via which autonomous vehicles change their relative positions to communicate with each other via LOS paths. Even though vehicles with the proposed method do not use all the information of the environment and do not cooperate with each other, they can decide their action (e.g., lane change and overtaking) and form long relays only using information of their surroundings (e.g., surrounding vehicle positions). The decision-making problem is formulated as a Markov decision process such that autonomous vehicles can learn a practical movement strategy for making long relays by a reinforcement learning (RL) algorithm. This paper designs a learning algorithm based on a sophisticated deep reinforcement learning algorithm, asynchronous advantage actor-critic (A3C), which enables vehicles to learn a complex movement strategy quickly through its deep-neural-network architecture and multi-agent-learning mechanism. Once the strategy is well trained, vehicles can move independently to establish long relays and connect to the RSUs via the relays. Simulation results confirm that the proposed method can increase the relay length and coverage even if the traffic conditions and penetration ratio of mmWave communication devices in the learning and operation phases are different.

  • Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor

    Shiyao DING  Toshimitsu USHIO  

     
    LETTER-Mathematical Systems Science

      Vol:
    E102-A No:4
      Page(s):
    708-711

    It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the LR-I lagging anchor algorithm.

  • A Robot Model That Obeys a Norm of a Human Group by Participating in the Group and Interacting with Its Members

    Yotaro FUSE  Hiroshi TAKENOUCHI  Masataka TOKUMARU  

     
    PAPER-Kansei Information Processing, Affective Information Processing

      Pubricized:
    2018/10/03
      Vol:
    E102-D No:1
      Page(s):
    185-194

    Herein, we proposed a robot model that will obey a norm of a certain group by interacting with the group members. Using this model, a robot system learns the norm of the group as a group member itself. The people with individual differences form a group and a characteristic norm that reflects the group members' personalities. When robots join a group that includes humans, the robots need to obey a characteristic norm: a group norm. We investigated whether the robot system generates a decision-making criterion to obey group norms by learning from interactions through reinforcement learning. In this experiment, human group members and the robot system answer same easy quizzes that could have several vague answers. When the group members answered differently from one another at first, we investigated whether the group members answered the quizzes while considering the group norm. To avoid bias toward the system's answers, one of the participants in a group only obeys the system, whereas the other participants are unaware of the system. Our experiments revealed that the group comprising the participants and the robot system forms group norms. The proposed model enables a social robot to make decisions socially in order to adjust their behaviors to common sense not only in a large human society but also in partial human groups, e.g., local communities. Therefore, we presumed that these robots can join human groups by interacting with its members. To adapt to these groups, these robots adjust their own behaviors. However, further studies are required to reveal whether the robots' answers affect people and whether the participants can form a group norm based on a robot's answer even in a situation wherein the participants recognize that they are interacting in a group that include a real robot. Moreover, some participants in a group do not know that the other participant only obeys the system's decisions and pretends to answer questions to prevent biased answers.

  • Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning

    Zhi-xiong XU  Lei CAO  Xi-liang CHEN  Chen-xi LI  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/06/18
      Vol:
    E101-D No:9
      Page(s):
    2409-2412

    Aiming at the contradiction between exploration and exploitation in deep reinforcement learning, this paper proposes “reward-based exploration strategy combined with Softmax action selection” (RBE-Softmax) as a dynamic exploration strategy to guide the agent to learn. The superiority of the proposed method is that the characteristic of agent's learning process is utilized to adapt exploration parameters online, and the agent is able to select potential optimal action more effectively. The proposed method is evaluated in discrete and continuous control tasks on OpenAI Gym, and the empirical evaluation results show that RBE-Softmax method leads to statistically-significant improvement in the performance of deep reinforcement learning algorithms.

  • Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach

    Zhi-xiong XU  Lei CAO  Xi-liang CHEN  Chen-xi LI  Yong-liang ZHANG  Jun LAI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/05/22
      Vol:
    E101-D No:9
      Page(s):
    2315-2322

    The commonly used Deep Q Networks is known to overestimate action values under certain conditions. It's also proved that overestimations do harm to performance, which might cause instability and divergence of learning. In this paper, we present the Deep Sarsa and Q Networks (DSQN) algorithm, which can considered as an enhancement to the Deep Q Networks algorithm. First, DSQN algorithm takes advantage of the experience replay and target network techniques in Deep Q Networks to improve the stability of neural networks. Second, double estimator is utilized for Q-learning to reduce overestimations. Especially, we introduce Sarsa learning to Deep Q Networks for removing overestimations further. Finally, DSQN algorithm is evaluated on cart-pole balancing, mountain car and lunarlander control task from the OpenAI Gym. The empirical evaluation results show that the proposed method leads to reduced overestimations, more stable learning process and improved performance.

  • Incremental Estimation of Natural Policy Gradient with Relative Importance Weighting

    Ryo IWAKI  Hiroki YOKOYAMA  Minoru ASADA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/06/01
      Vol:
    E101-D No:9
      Page(s):
    2346-2355

    The step size is a parameter of fundamental importance in learning algorithms, particularly for the natural policy gradient (NPG) methods. We derive an upper bound for the step size in an incremental NPG estimation, and propose an adaptive step size to implement the derived upper bound. The proposed adaptive step size guarantees that an updated parameter does not overshoot the target, which is achieved by weighting the learning samples according to their relative importances. We also provide tight upper and lower bounds for the step size, though they are not suitable for the incremental learning. We confirm the usefulness of the proposed step size using the classical benchmarks. To the best of our knowledge, this is the first adaptive step size method for NPG estimation.

  • A Real-Time Subtask-Assistance Strategy for Adaptive Services Composition

    Li QUAN  Zhi-liang WANG  Xin LIU  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2018/01/30
      Vol:
    E101-D No:5
      Page(s):
    1361-1369

    Reinforcement learning has been used to adaptive service composition. However, traditional algorithms are not suitable for large-scale service composition. Based on Q-Learning algorithm, a multi-task oriented algorithm named multi-Q learning is proposed to realize subtask-assistance strategy for large-scale and adaptive service composition. Differ from previous studies that focus on one task, we take the relationship between multiple service composition tasks into account. We decompose complex service composition task into multiple subtasks according to the graph theory. Different tasks with the same subtasks can assist each other to improve their learning speed. The results of experiments show that our algorithm could obtain faster learning speed obviously than traditional Q-learning algorithm. Compared with multi-agent Q-learning, our algorithm also has faster convergence speed. Moreover, for all involved service composition tasks that have the same subtasks between each other, our algorithm can improve their speed of learning optimal policy simultaneously in real-time.

  • A Study of Qualitative Knowledge-Based Exploration for Continuous Deep Reinforcement Learning

    Chenxi LI  Lei CAO  Xiaoming LIU  Xiliang CHEN  Zhixiong XU  Yongliang ZHANG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/07/26
      Vol:
    E100-D No:11
      Page(s):
    2721-2724

    As an important method to solve sequential decision-making problems, reinforcement learning learns the policy of tasks through the interaction with environment. But it has difficulties scaling to large-scale problems. One of the reasons is the exploration and exploitation dilemma which may lead to inefficient learning. We present an approach that addresses this shortcoming by introducing qualitative knowledge into reinforcement learning using cloud control systems to represent ‘if-then’ rules. We use it as the heuristics exploration strategy to guide the action selection in deep reinforcement learning. Empirical evaluation results show that our approach can make significant improvement in the learning process.

  • Relation Extraction with Deep Reinforcement Learning

    Hongjun ZHANG  Yuntian FENG  Wenning HAO  Gang CHEN  Dawei JIN  

     
    PAPER-Natural Language Processing

      Pubricized:
    2017/05/17
      Vol:
    E100-D No:8
      Page(s):
    1893-1902

    In recent years, deep learning has been widely applied in relation extraction task. The method uses only word embeddings as network input, and can model relations between target named entity pairs. It equally deals with each relation mention, so it cannot effectively extract relations from the corpus with an enormous number of non-relations, which is the main reason why the performance of relation extraction is significantly lower than that of relation classification. This paper designs a deep reinforcement learning framework for relation extraction, which considers relation extraction task as a two-step decision-making game. The method models relation mentions with CNN and Tree-LSTM, which can calculate initial state and transition state for the game respectively. In addition, we can tackle the problem of unbalanced corpus by designing penalty function which can increase the penalties for first-step decision-making errors. Finally, we use Q-Learning algorithm with value function approximation to learn control policy π for the game. This paper sets up a series of experiments in ACE2005 corpus, which show that the deep reinforcement learning framework can achieve state-of-the-art performance in relation extraction task.

  • Optimal Digital Control with Uncertain Network Delay of Linear Systems Using Reinforcement Learning

    Taishi FUJITA  Toshimitsu USHIO  

     
    PAPER

      Vol:
    E99-A No:2
      Page(s):
    454-461

    Recent development in network technology can realize the control of a remote plant by a digital controller. However, there is a delay caused by data transmission of control inputs and outputs. The delay degrades the control performance without taking it into consideration. In general, it is a difficult problem to identify the delay beforehand. We also assume that the plant's parameters have uncertainty. To solve the problem, we use reinforcement learning to achieve optimal digital control. First, we consider state feedback control. Next, we consider the case where the plant's outputs are observed, and apply reinforcement learning to output feedback control. Finally, we demonstrate by simulation that the proposed control method can search for the optimal gain and that it can adapt to the change of the delay.

  • Sarsa Learning Based Route Guidance System with Global and Local Parameter Strategy

    Feng WEN  Xingqiao WANG  

     
    PAPER-Intelligent Transport System

      Vol:
    E98-A No:12
      Page(s):
    2686-2693

    Route guidance system is one of the essential components of a vehicle navigation system in ITS. In this paper, a centrally determined route guidance system is established to solve congestion problems. The Sarsa learning method is used to guide vehicles, and global and local parameter strategy is proposed to adjust the vehicle guidance by considering the whole traffic system and local traffic environment, respectively. The proposed method can save the average driving time and relieve traffic congestion. The evaluation was done using two cases on different road networks. The experimental results show the efficiency and effectiveness of the proposed algorithm.

  • Adaptive Q-Learning Cell Selection Method for Open-Access Femtocell Networks: Multi-User Case

    Chaima DHAHRI  Tomoaki OHTSUKI  

     
    PAPER-Network Management/Operation

      Vol:
    E97-B No:8
      Page(s):
    1679-1688

    Open-access femtocell networks assure the cellular user of getting a better and stronger signal. However, due to the small range of femto base stations (FBSs), any motion of the user may trigger handover. In a dense environment, the possibility of such handover is very frequent. To avoid frequent communication disruptions due to phenomena such as the ping-pong effect, it is necessary to ensure the effectiveness of the cell selection method. Existing selection methods commonly uses a measured channel/cell quality metric such as the channel capacity (between the user and the target cell). However, the throughput experienced by the user is time-varying because of the channel condition, i.e., owing to the propagation effects or receiver location. In this context, the conventional approach does not reflect the future performance. To ensure the efficiency of cell selection, user's decision needs to depend not only on the current state of the network, but also on the future possible states (horizon). To this end, we implement a learning algorithm that can predict, based on the past experience, the best performing cell in the future. We present in this paper a reinforcement learning (RL) framework as a generic solution for the cell selection problem in a non-stationary femtocell network that selects, without prior knowledge about the environment, a target cell by exploring past cells' behavior and predicting their potential future states based on Q-learning algorithm. Then, we extend this proposal by referring to a fuzzy inference system (FIS) to tune Q-learning parameters during the learning process to adapt to environment changes. Our solution aims at minimizing the frequency of handovers without affecting the user experience in terms of channel capacity. Simulation results demonstrate that· our solution comes very close to the performance of the opportunistic method in terms of capacity, while fewer handovers are required on average.· the use of fuzzy rules achieves better performance in terms of received reward (capacity) and number of handovers than fixing the values of Q-learning parameters.

  • An Intelligent Fighting Videogame Opponent Adapting to Behavior Patterns of the User

    Koichi MORIYAMA  Simón Enrique ORTIZ BRANCO  Mitsuhiro MATSUMOTO  Ken-ichi FUKUI  Satoshi KURIHARA  Masayuki NUMAO  

     
    PAPER-Information Network

      Vol:
    E97-D No:4
      Page(s):
    842-851

    In standard fighting videogames, users usually prefer playing against other users rather than against machines because opponents controlled by machines are in a rut and users can memorize their behaviors after repetitive plays. On the other hand, human players adapt to each other's behaviors, which makes fighting videogames interesting. Thus, in this paper, we propose an artificial agent for a fighting videogame that can adapt to its users, allowing users to enjoy the game even when playing alone. In particular, this work focuses on combination attacks, or combos, that give great damage to the opponent. The agent treats combos independently, i.e., it is composed of a subagent for predicting combos the user executes, that for choosing combos the agent executes, and that for controlling the whole agent. Human users evaluated the agent compared to static opponents, and the agent received minimal negative ratings.

21-40hit(72hit)